1. Loading the Data

2. Exploratory Data Analysis

a. Comparison among the data sets

Monthly data

The following features can be seen in the figure below.
  • Seasonality of the one-year period can be observed in some drugs (especially in N02BE, R03, and R06).
  • Each drug has different cycles and there are no uniform seasonality.
  • There is a significant drop in sales in January 2017 and October 2019.
# time plot
df_drug_monthly_fixed |> 
  autoplot(Sales) + 
  labs(title = "Monthly sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_monthly_fixed |> 
  gg_subseries(Sales) + 
  labs(title = "Monthly sales of drugs") +
  theme(legend.position = "none")

df_drug_monthly_fixed |> 
  ggplot(aes(x = as.factor(month(Month)), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 2)

df_drug_monthly_fixed |> 
  gg_season(Sales)

Weekly data

The following features can be seen in the figure below.
  • Seasonality of the one-year period can also be observed in some drugs (especially in N02BE, R03, and R06) as we can see from the monthly data.
  • There is no significant drop in sales in January 2017 and October 2019, which means there may be some problems with aggregation.
# time plot
df_drug_weekly |> 
  autoplot(Sales) + 
  labs(title = "weekly sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_weekly |> 
  gg_subseries(Sales) + 
  labs(title = "weekly sales of drugs") +
  theme(legend.position = "none")

df_drug_weekly |> 
  ggplot(aes(x = as.factor(week(Week)), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 1)

df_drug_weekly |> 
  gg_season(Sales)

Daily data

The following features can be seen in the figure below.
  • We cannot observe any weekly seasonality from the daily data.
# time plot
df_drug_daily |> 
  autoplot(Sales) + 
  labs(title = "daily sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_daily |> 
  gg_subseries(Sales, period = "week") + 
  labs(title = "daily sales of drugs") +
  theme(legend.position = "none")

df_drug_daily |> 
  ggplot(aes(x = weekdays(Date), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 2)

df_drug_daily |> 
  gg_season(Sales, period = "week")

df_drug_daily |> 
  gg_season(Sales, period = "month")

df_drug_daily |> 
  gg_season(Sales, period = "year")

Hourly data

The following features can be seen in the figure below.
  • We can observe daily seasonality from the data however it can be estimated that it just comes from the business hours.
# time plot
df_drug_hourly |> 
  autoplot(Sales) + 
  labs(title = "hourly sales of drugs") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 1) +
  theme(legend.position = "none")

# Seasonal subseries plots
df_drug_hourly |> 
  gg_subseries(Sales, period = "day") + 
  labs(title = "daily sales of drugs") +
  theme(legend.position = "none")

df_drug_hourly |> 
  ggplot(aes(x = as.factor(hour(Datetime)), y = Sales)) +
  geom_boxplot() +
  facet_wrap(~ Drug, scales = "free_y", ncol = 1)

df_drug_hourly |> 
  gg_season(Sales, period = "day")

From the above, it can be estimated that this data has a one-year seasonality, and therefore, weekly or monthly data are candidates for data that can be utilized in the forecast. Since there are some missing data of unknown reason in the monthly data, it is appropriate to use the weekly data in this analysis.

b. Moving average smoothing and Decomposition

In the following, monthly data is used to perform Moving average smoothing and STL Decomposition.

# remove the data of the last month
df_drug_monthly_fixed <- df_drug_monthly_fixed |> 
  filter(as.character(Month) != "2019 Oct")
## Moving average smoothing

# add the 12-MA and 12x2-MA to the data set
df_drug_monthly_fixed <- df_drug_monthly_fixed |> 
  group_by(Drug) |> 
  mutate(
    MA_12 = slider::slide_dbl(Sales, mean, 
                              .before = 5, .after = 6, .complete = TRUE),
    MA_12x2 = slider::slide_dbl(MA_12, mean, 
                                .before = 1, .after = 0, .complete = TRUE)
  ) |> 
  ungroup()

# plot 12x2-MA
df_drug_monthly_fixed |> 
  autoplot(Sales) +
  geom_line(aes(y = MA_12x2, color = "12-month Moving Average"), size =1) +
  labs(title = "Monthly Sales of Drugs with 12-month Moving Average") +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) +
  theme(legend.position = "bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

# plot standardized moving average 
df_drug_monthly_fixed <- df_drug_monthly_fixed |> 
  group_by(Drug) |> 
  mutate(stdl_MA = MA_12x2 / MA_12x2[as.character(Month) == "2014 Jul"]) |> 
  ungroup()

df_drug_monthly_fixed |> 
  autoplot(stdl_MA) +
  labs(title = "Standardized 12-month Moving Average",
       y = "Standardized Monthly Sales") +
  theme(legend.position = "bottom")
## Warning: Removed 96 rows containing missing values or values outside the scale range
## (`geom_line()`).



From the above plot, it can be seen that several drugs share a common trend.

M01AB and M01AE gradually increased and peaked until mid-2016, then gradually decreased until 2017, and then stabilized thereafter.


N02BA and N02BE reach their peak in mid-2016, as M01AB and M01AE do, but then the slope of the decline becomes steeper after that than M01AB and M01AE . N02BE then begins to rise sharply in mid-2017, while N02BA continues to decline, with the slope of the decline moderating



N05B and N05C both declined until mid-2015, then rose slightly, then declined again in mid-2017, and then continued to rise.


R03 and R06 both increased to a peak in mid-2016 and then declined for a while, but began to rise again in mid-2017 and have been rising consistently since then.

## Decomposition

# function for STL decomposition
STL_dcmp <- function(df, var, code) {
  df |>
    filter(get(var) == code) |>
    model(
      STL(Sales ~ trend(window = 13) +
            season(window = 21),
          robust = TRUE)
    ) |>
    components() |>
    autoplot() +
    ggtitle(paste("STL Decomposition:", code))
}

# apply STL decomposition to all unique drug code
lapply(unique(df_drug_monthly_fixed$Drug),
       function(code) {
         STL_dcmp(df=df_drug_monthly_fixed, var="Drug", code=code)
       }
)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

Add some implications from the decomposition.

c. Stationary Analysis

In the following, weekly data is used.

## ACF and PACF
# ACF plot
df_drug_monthly_fixed |> 
  ACF(Sales) |> 
  autoplot() +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) 

# PACF plot
df_drug_monthly_fixed |> 
  PACF(Sales) |> 
  autoplot() +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2) 

Add some implications .

d. ADF test

# function for ADF test
ADF_test <- function(drug_code) {
  print(drug_code)
  df_drug_monthly_fixed |>
    filter(Drug == drug_code) |>
    pull(Sales) |> 
    tseries::adf.test() |> 
    print()
}

# apply ADF test to all unique drug code
for (drug_code in unique(df_drug_monthly_fixed$Drug)) {
  ADF_test(drug_code)
}
## [1] "M01AB"
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -1.9394, Lag order = 4, p-value = 0.5998
## alternative hypothesis: stationary
## 
## [1] "M01AE"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -2.4702, Lag order = 4, p-value = 0.3841
## alternative hypothesis: stationary
## 
## [1] "N02BA"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -2.7674, Lag order = 4, p-value = 0.2633
## alternative hypothesis: stationary
## 
## [1] "N02BE"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -4.9332, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
## 
## [1] "N05B"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -3.2555, Lag order = 4, p-value = 0.08621
## alternative hypothesis: stationary
## 
## [1] "N05C"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -2.9392, Lag order = 4, p-value = 0.1935
## alternative hypothesis: stationary
## 
## [1] "R03"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -4.3962, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
## 
## [1] "R06"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  pull(filter(df_drug_monthly_fixed, Drug == drug_code), Sales)
## Dickey-Fuller = -5.0795, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

Add some implications .

3. Forecasting

## Forecasting
# use 12 months as a test set  
train <- df_drug_monthly_fixed |>
  filter_index("2014 Jan" ~ "2018 Sep")

test <- df_drug_monthly_fixed |>
  filter_index("2018 Oct" ~ "2019 Sep")

# remove the data of the last month
df_drug_monthly_fixed <- df_drug_monthly_fixed |> 
  filter(as.character(Month) != "2019 Oct")

a. Fit the models

1. Baseline Models (Mean, Naïve and Seasonal naïve)

# Fit the models
base_fit <- train |>
  model(
    Mean = MEAN(Sales),
    `Naïve` = NAIVE(Sales),
    `Seasonal naïve` = SNAIVE(Sales)
  )

# Generate forecasts for 52 weeks
base_fc <- base_fit |> forecast(h = 12)

# Plot forecasts against actual values
base_fc |>
  autoplot(test, level = NULL) +
  autolayer(
    filter_index(df_drug_monthly_fixed, "2018 Nov" ~ .),
    colour = "black"
  ) +
  labs(
    y = "Sales",
    title = "Base Forecasts for monthly Drug Sales (for test data)"
  ) +
  guides(colour = guide_legend(title = "Forecast")) +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2)
## Plot variable not specified, automatically selected `.vars = Sales`

# calculate accuracy
base_ac <- accuracy(base_fc, test)

# select RMSE and MAPE and convert it to wide data
base_rmse <- base_ac |> 
  select(.model, Drug, RMSE) |> 
  pivot_wider(names_from = Drug, values_from = RMSE)

base_rmse
## # A tibble: 3 × 9
##   .model         M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mean            18.1  26.2  31.4  312.  33.2  6.41 128.   49.3
## 2 Naïve           16.9  26.1  12.2  341.  31.3 10.8   98.2  48.3
## 3 Seasonal naïve  21.6  29.2  17.5  202.  45.6 11.5   89.9  37.8
base_mape <- base_ac |> 
  select(.model, Drug, MAPE) |> 
  pivot_wider(names_from = Drug, values_from = MAPE)

base_mape
## # A tibble: 3 × 9
##   .model         M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mean            8.85  16.5  33.4  29.9  11.7  25.8  37.1  42.8
## 2 Naïve           8.44  14.4  10.1  37.5  10.5  42.3  38.4  43.7
## 3 Seasonal naïve  9.61  15.9  14.6  16.1  15.2  40.9  27.5  24.4

2. ARIMA Models

## ARIMA Models

# Fit the models
arima_fit <- train |>
  model(ARIMA(Sales))

arima_fit
## # A mable: 8 x 2
## # Key:     Drug [8]
##   Drug                      `ARIMA(Sales)`
##   <chr>                            <model>
## 1 M01AB                     <ARIMA(0,1,1)>
## 2 M01AE             <ARIMA(1,0,1) w/ mean>
## 3 N02BA          <ARIMA(0,1,4)(1,0,0)[12]>
## 4 N02BE          <ARIMA(1,0,0)(1,1,0)[12]>
## 5 N05B              <ARIMA(1,0,0) w/ mean>
## 6 N05C              <ARIMA(0,0,0) w/ mean>
## 7 R03   <ARIMA(0,0,1)(1,1,0)[12] w/ drift>
## 8 R06   <ARIMA(0,0,0)(0,1,1)[12] w/ drift>
# Generate forecasts for 52 weeks
arima_fc <- arima_fit |> forecast(h = 12)

# Plot forecasts against actual values
arima_fc |>
  autoplot(test, level = NULL) +
  autolayer(
    filter_index(df_drug_monthly_fixed, "2018 Nov" ~ .),
    colour = "black"
  ) +
  labs(
    y = "Sales",
    title = "ARIMA Forecasts for monthly Drug Sales (for test data)"
  ) +
  guides(colour = guide_legend(title = "Forecast")) +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2)
## Plot variable not specified, automatically selected `.vars = Sales`

# calculate accuracy
arima_ac <- accuracy(arima_fc, test)

# select RMSE and MAPE and convert it to wide data
arima_rmse <- arima_ac |> 
  select(.model, Drug, RMSE) |> 
  pivot_wider(names_from = Drug, values_from = RMSE)

arima_rmse
## # A tibble: 1 × 9
##   .model       M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(Sales)  19.4  26.2  11.2  166.  34.1  6.41  75.5  27.8
arima_mape <- arima_ac |> 
  select(.model, Drug, MAPE) |> 
  pivot_wider(names_from = Drug, values_from = MAPE)

arima_mape
## # A tibble: 1 × 9
##   .model       M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(Sales)  9.41  15.6  10.5  13.3  12.0  25.8  24.8  20.8

3. ETS Models

## ETS Models

# Fit the models
ets_fit <- train |>
  model(ETS(Sales))

ets_fit
## # A mable: 8 x 2
## # Key:     Drug [8]
##   Drug   `ETS(Sales)`
##   <chr>       <model>
## 1 M01AB  <ETS(A,N,N)>
## 2 M01AE  <ETS(M,N,N)>
## 3 N02BA  <ETS(M,N,M)>
## 4 N02BE  <ETS(M,N,M)>
## 5 N05B   <ETS(M,N,N)>
## 6 N05C  <ETS(A,Ad,N)>
## 7 R03    <ETS(M,N,A)>
## 8 R06    <ETS(M,N,M)>
# Generate forecasts for 52 weeks
ets_fc <- ets_fit |> forecast(h = 12)

# Plot forecasts against actual values
ets_fc |>
  autoplot(test, level = NULL) +
  autolayer(
    filter_index(df_drug_monthly_fixed, "2018 Nov" ~ .),
    colour = "black"
  ) +
  labs(
    y = "Sales",
    title = "ETS Forecasts for monthly Drug Sales (for test data)"
  ) +
  guides(colour = guide_legend(title = "Forecast")) +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2)
## Plot variable not specified, automatically selected `.vars = Sales`

# calculate accuracy
ets_ac <- accuracy(ets_fc, test)

# select RMSE and MAPE and convert it to wide data
ets_rmse <- ets_ac |> 
  select(.model, Drug, RMSE) |> 
  pivot_wider(names_from = Drug, values_from = RMSE)

ets_rmse
## # A tibble: 1 × 9
##   .model     M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(Sales)  19.4  26.1  14.6  190.  31.3  7.47  88.4  28.6
ets_mape <- ets_ac |> 
  select(.model, Drug, MAPE) |> 
  pivot_wider(names_from = Drug, values_from = MAPE)

ets_mape
## # A tibble: 1 × 9
##   .model     M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(Sales)  9.42  15.1  13.0  17.7  10.5  29.2  26.6  20.3

4. Prophet Models

## Prophet Models

# Fit the models
prophet_fit <- train |>
  model(prophet(Sales))

prophet_fit
## # A mable: 8 x 2
## # Key:     Drug [8]
##   Drug  `prophet(Sales)`
##   <chr>          <model>
## 1 M01AB        <prophet>
## 2 M01AE        <prophet>
## 3 N02BA        <prophet>
## 4 N02BE        <prophet>
## 5 N05B         <prophet>
## 6 N05C         <prophet>
## 7 R03          <prophet>
## 8 R06          <prophet>
# Generate forecasts for 52 weeks
prophet_fc <- prophet_fit |> forecast(h = 12)

# Plot forecasts against actual values
prophet_fc |>
  autoplot(test, level = NULL) +
  autolayer(
    filter_index(df_drug_monthly_fixed, "2018 Nov" ~ .),
    colour = "black"
  ) +
  labs(
    y = "Sales",
    title = "Prophet Forecasts for monthly Drug Sales (for test data)"
  ) +
  guides(colour = guide_legend(title = "Forecast")) +
  facet_wrap(vars(Drug), scales = "free_y", ncol = 2)
## Plot variable not specified, automatically selected `.vars = Sales`

# calculate accuracy
prophet_ac <- accuracy(prophet_fc, test)

# select RMSE and MAPE and convert it to wide data
prophet_rmse <- prophet_ac |> 
  select(.model, Drug, RMSE) |> 
  pivot_wider(names_from = Drug, values_from = RMSE)

prophet_rmse
## # A tibble: 1 × 9
##   .model         M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 prophet(Sales)  32.4  26.2  24.2  212.  78.4  10.5  80.0  36.0
prophet_mape <- prophet_ac |> 
  select(.model, Drug, MAPE) |> 
  pivot_wider(names_from = Drug, values_from = MAPE)

prophet_mape
## # A tibble: 1 × 9
##   .model         M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 prophet(Sales)  15.4  14.1  21.1  20.7  23.5  39.0  30.4  21.4

b. Model Comparison

## comparison
rmse_all <- rbind(base_rmse, arima_rmse, ets_rmse, prophet_rmse)
rmse_all
## # A tibble: 6 × 9
##   .model         M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mean            18.1  26.2  31.4  312.  33.2  6.41 128.   49.3
## 2 Naïve           16.9  26.1  12.2  341.  31.3 10.8   98.2  48.3
## 3 Seasonal naïve  21.6  29.2  17.5  202.  45.6 11.5   89.9  37.8
## 4 ARIMA(Sales)    19.4  26.2  11.2  166.  34.1  6.41  75.5  27.8
## 5 ETS(Sales)      19.4  26.1  14.6  190.  31.3  7.47  88.4  28.6
## 6 prophet(Sales)  32.4  26.2  24.2  212.  78.4 10.5   80.0  36.0
mape_all <- rbind(base_mape, arima_mape, ets_mape, prophet_mape)
mape_all
## # A tibble: 6 × 9
##   .model         M01AB M01AE N02BA N02BE  N05B  N05C   R03   R06
##   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mean            8.85  16.5  33.4  29.9  11.7  25.8  37.1  42.8
## 2 Naïve           8.44  14.4  10.1  37.5  10.5  42.3  38.4  43.7
## 3 Seasonal naïve  9.61  15.9  14.6  16.1  15.2  40.9  27.5  24.4
## 4 ARIMA(Sales)    9.41  15.6  10.5  13.3  12.0  25.8  24.8  20.8
## 5 ETS(Sales)      9.42  15.1  13.0  17.7  10.5  29.2  26.6  20.3
## 6 prophet(Sales) 15.4   14.1  21.1  20.7  23.5  39.0  30.4  21.4